Zipf ’ S Law in Literature

نویسندگان

  • L. L. Gonçalves
  • L. B. Gonçalves
چکیده

We present in this paper a numerical investigation of literary texts by various well-known English writers, covering the first half of the twentieth century, based upon the results obtained through corpus analysis of the texts. A fractal power law is obtained for the lexical wealth defined as the ratio between the number of different words and the total number of words of a given text. By considering as a signature of each author the exponent and the amplitude of the power law, and the standard deviation of the lexical wealth, it is possible to discriminate works of different genres and writers and show that each writer has a very distinct signature, either considered among other literary writers or compared with writers of non-literary texts. It is also shown that, for a given author, the signature is able to discriminate between short stories and novels. 1. INTRODUCTION The power law distribution of events introduced by the Italian economist Pareto [1], in the context of the wealth of nations and individuals, and restated by Zipf [2] concerning linguistics is perhaps the most ubiquitous law in nature. This power law statistics, which is a characteristic of fractal behaviour, is present in many different areas ranging from physics [3] and biology [4,5] to natural hazards [6,7,8], and from musical creative context [9] to economics [10]. Several applications in linguistics have already been done, but the literary aspects which have been analysed until now are all related to word formation within the literary texts [12,13]. To the best of our knowledge, no attempt has been made to verify the possibility of existence of a Zipf's law type within the literary context. By using the software WordSmith, from Oxford University Press, with electronic texts one is able to obtain the types/tokens ratio, types being the number of different words in a text and tokens, the total number of words in this text. Considering the fact that the types/tokens ratio will necessarily decrease as the tokens increase, we were led to think of a power law distribution, and therefore of relating this behaviour to Zipf's laws. Although the use of a rich vocabulary is not the sole indication of creativity in a writer, lexical wealth is certainly one characteristic to be expected in a literary writer of the stature of J. others. As a hypothesis, one would expect to find in literary texts a high rate of …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Popular is Your Paper? An Empirical Study of the Citation Distribution

Numerical data for the distribution of citations are examined for: (i) papers published in 1981 in journals which are catalogued by the Institute for Scientific Information (783,339 papers) and (ii) 20 years of publications in Physical Review D, vols. 11-50 (24,296 papers). A Zipf plot of the number of citations to a given paper versus its citation rank appears to be consistent with a power-law...

متن کامل

Strategy for Investments from Zipf Law(s)

We have applied the Zipf method to extract the ζ ′ exponent for seven financial indices (DAX, FTSE; DJIA, NASDAQ, S&P500; Hang-Seng and Nikkei 225), after having translated the signals into a text based on two letters. We follow considerations based on the signal Hurst exponent and the notion of a time dependent Zipf law and exponent in order to implement two simple investment strategies for su...

متن کامل

Generalized (m,k)-Zipf law for fractional Brownian motion-like time series with or without effect of an additional linear trend

We have translated fractional Brownian motion (FBM) signals into a text based on two ”letters”, as if the signal fluctuations correspond to a constant stepsize random walk. We have applied the Zipf method to extract the ζ′ exponent relating the word frequency and its rank on a loglog plot. We have studied the variation of the Zipf exponent(s) giving the relationship between the frequency of occ...

متن کامل

Generalized (m,k)-Zipf law for fractional Brownian motion-like time series with or without effect of an additional linear trend

We have translated fractional Brownian motion (FBM) signals into a text based on two ”letters”, as if the signal fluctuations correspond to a constant stepsize random walk. We have applied the Zipf method to extract the ζ′ exponent relating the word frequency and its rank on a loglog plot. We have studied the variation of the Zipf exponent(s) giving the relationship between the frequency of occ...

متن کامل

A Trend Analysis of Information Management Research by Bibliometric Methodology, 1957 - 2008

This paper mainly describes the trend analysis of international periodicals and literatures which titles as well-known “Information management” at SSCI database from 1957 to 2008. The result appeared that the literatures production within information management title tower onto the vibration period in the last decade. Most of document type is article, constituting 54.27% of the total literature...

متن کامل

Investigation of the Zipf-plot of the extinct Meroitic language

The ancient and extinct language Meroitic is investigated using Zipf’s Law. In particular, since Meroitic is still undeciphered, the Zipf law analysis allows us to assess the quality of current texts and possible avenues for future investigation using statistical techniques.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005